Developments in Corpus-Based Speech Synthesis: Approaching Natural Conversational Speech
نویسنده
چکیده
This paper describes the special demands of conversational speech in the context of corpus-based speech synthesis. The author proposed the CHATR system of prosody-based unit-selection for concatenative waveform synthesis seven years ago, and now extends this work to incorporate the results of an analysis of five-years of recordings of spontaneous conversational speeech in a wide range of actual daily-life situations. The paper proposes that the expresion of affect (often translated as ‘kansei’ in Japanese) is the main factor differentiating laboratory speech from realworld conversational speech, and presents a framework for the specification of affect through differences in speaking style and voice quality. Having an enormous corpus of speech samples available for concatenation allows the selection of complete phrase-sized utterance segments, and changes the focus of unit selection from segmental or phonetic continuity to one of prosodic and discoursal appropriateness instead. Samples of the resulting large-corpus-based synthesis can be heard at http://feast.his.atr.jp/AESOP. key words: speech synthesis, corpora, concatenation, paralinguistic information, communication, affect
منابع مشابه
Synthesis and evaluation of conversational characteristics in speech synthesis
Conventional synthetic voices can synthesise neutral read aloud speech well. But, to make synthetic speech more suitable for a wider range of applications, the voices need to express more than just the word identity. We need to develop voices that can partake in a conversation and express, e.g. agreement, disagreement, hesitation, in a natural and believable manner. In speech synthesis there ar...
متن کاملTowards conversational speech synthesis; lessons learned from the expressive speech processing project
This paper discusses some ideas for the requirements and methods of conversational speech synthesis, based on experience gained from the collection and analysis of a very large corpus of conversational speech in a variety of real-life everyday contexts. It shows that because variation in voice quality plays a significant part in the transmission of interpersonal and affect-related social inform...
متن کاملNatural-sounding Speech Synthesis Using Variable-length Units1
The goal of this work was to develop a speech synthesis system which concatenates variable-length units to create naturalsounding speech. Our initial work in this area showed that by careful design of system responses to ensure consistent intonation contours, natural-sounding speech synthesis was achievable with wordand phrase-level concatenation. In order to extend the flexibility of this fram...
متن کاملEvaluating expressive speech synthesis from audiobooks in conversational phrases
CNGL, School of Computer Science and Informatics, University College Dublin Dublin, Ireland {eva.szekely|mohamed.abou-zleikha}@ucdconnect.ie, {joao.cabral|peter.cahill|julie.berndsen}@ucd.ie Abstract Audiobooks are a rich resource of large quantities of natural sounding, highly expressive speech. In our previous research we have shown that it is possible to detect different expressive voice sty...
متن کاملSynthesis Units for Conversational Speech - Using Phrasal Segments -
This paper describes the use of phrase-sized segments for the concatenative synthesis of conversational speech and discusses the differences in selection criteria that become necessary when the source corpus contains several years of conversational speech samples. It claims that naturalsounding conversational speech can be reproduced by use of such phrase-sized chunks for concatenation, and tha...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEICE Transactions
دوره 88-D شماره
صفحات -
تاریخ انتشار 2005